Likelihood-Free Parallel Tempering. 
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Abstract 



Approximate Bayesian Computational (ABC) methods, or likelihood-free methods, have 
appeared in the past fifteen years as useful methods to perform Bayesian analysis when 
the likelihood is analytically or computationally intractabl e. Several ABC metho ds have 
been propose d: MCMC methods have been developed by iMarioram et al.l |2003l | and by 



Bortot et al.l 120071 for in st ance, and sequential m etho ds have been proposed among oth 



ers bv ISisson et al.l |2007| . iBeaumont et all |2009| and iDel Moral et all [2009j . Recently, 
sequen tial ABC methods hav e appeared as an alt ernative to ABC-PMC methods [see for in- 
stance iMcKinlev et all 120091 ISisson et all |2007| . In this paper a new algorithm combining 
population-based MCMC methods wi th ABC requ irements is proposed, using an analogy 
with the parallel tempering algorithm [Geyeii Il99l| . Performance is compared with existing 
ABC algorithms on simulations and on a real example. 

Keywords: Approximated Bayesian Computational, likelihood-free, intractable likelihood, paral- 
lel tempering, population-based, Monte Carlo Markov chain. 
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1 Introduction 



The principle of Approximate Bayesian Computation (ABC) algorithms is to evaluate the pos- 
terior density when the likelihood function is intractable. More precisely, let x G ID C R n be an 
observed vector, 9 £ M, d a parameter of interest, and 7r(.) the prior distribution of 9. The aim of 
an ABC algorithm is to evaluate the following posterior density: 

tt(9 I x) oc f{x I 0)tt(0). 

The likelihood f(x \ 0) is supposed to be analytically or computationally intractable. Hence 
the first requirement is that its calculation is not necessary. The second requirement is the rel- 
ative ease of simulation from the model, given a parameter 9. These requirem ents are fulfilled 



in several areas in which AB C methods have been u sed, including gene tics [Pritchard et al 



19991. Leuenberger et al. . 2010], extre me values theor y Bortot et al. . 2007|, Gi bbs random fields 



Grelaud et aIU 2009] or epidemiology jTanaka et all I2006I . ISisson et all |2007| . 



The first ABC algorithm has been proposed bv lTavare et al.l 1997], and is simply a rejection 
algorithm. It consists in generating jointly 9' ~ it and z' ~ f(.\9'). The generated 9' is accepted if 
the equality z' = x holds, and is then denoted by 9\. This process is repeated until n generated 9' 
have been accepted. The accepted {6\, 82, ■ ■ ■ , 8 n } form an i.i.d. sample from the target posterior 
ir(. I x), that is why this a lgorithm is often called the exact ABC algorithm. It has been extended 
bv lPritchard et al.l |l999| in order to be applied in continuous state spaces. The idea is to accept 
a simulated z' if it is sufficiently close to the observed x. The condition z' = x is replaced by 
a distance p(z',x) < e, where p(-,.) is a one dimensional function and e is a tolerance level. In 
case of large datasets z' and x, it can be computationally demanding to calculate p{z' , x). Hence 
a second refinement is to compare summary statistics instead of the complete datasets, and the 
condition becomes p(S (z'), S(x)) < g, with S a statistic which can be multi-dimensional. The 
algorithm proposed by Pritchard et al. 1999] includes these two refinements and is considered 
as the standard of the ABC algorithms. This algorithm gives an i.i.d. sample from the following 
joint posterior distribution: 



7T« 



{Z,9 I X) OC TT(9)f(z I 9)l {p ( S (z),S(x))<e}(z)- 



(1) 



It is usually the marginal in 9 which is of interest; that is ir £ (9 \ x) oc J ir e (z, 9 \ x)dz. If S provides 
a good summary of the data and if e is sufficiently small, then ir e (9 \ x) can be considered as a 
reasonable approximation of the target posterior ir(9 \ x). Of course, the quality of the obtained 
approximation depends on the calibration of the algorithm by choosing e, p(., .) and S. 

The acceptance rates of the standard ABC algorithm can be very low, hence several improve- 
ments have been proposed, using MCMC or sequential schemes. 



1.1 ABC-MCMC schemes 



Algor ithms have been proposed bv lMarioram et al.l 2003l.lBortqt et al.l 20071 ] and lRatmann et al 



2007], the reference method being the one of iMarioram et al.l 20031 ] who suggested the use of 



a Markov kernel q(.\.) to propose a new parameter from a previous accepted one. The validity 
of this algorithm called ABC-MCMC is ensured, as the generated Markov chain (9^\z^) is er- 
godic, and its invariant density is the target ir £ (z, 9 \ x). The acceptance rates of this algorithm 
are higher than those obtained with a standard ABC algorithm, but two drawbacks are observed: 
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In case of small e the sampler can encounter difficulties to move in the parameter space. 
Indeed, the acceptance rate is proportional to the probability of simul ating a z' su c h tha t 



p(S(z'), S(x)) < e, which is itself proportional to the likelihood [see ISisson et all |2007| . 
This probability is relatively low if e is small, and in practice when the ABC-MCMC 
enters an area of relatively low probability, the acceptance probabilities to move elsewhere 
are quite small and the sampler encounters difficulties to move from this part of the space. 
As a consequence, not only the sampler sticks in the distribution tails, but also these tails 
are not well visited. 

The samples obtained are dependent, and often highly correlated. This is inherent in many 
methods based on MCMC, and not necessarily a problem, but if one needs non-correlated 
particles, sometimes thinning of the chain should be done. 



Modified toy example and ABC-MCMC 
To illustrate the behavior of the ABC-MCM C in distribut i on ta il s, we use a modi fi ed to y ex- 

" 12007 



ample inspired f rom t he example studied in ISisson et al. 



Beaumont et al.l [2009l | and 



Del Moral et~aH |2009j . The prior for 9 is uniform 1A[— 10, 10), the model is a mixture of Gaus- 
sian distributions given by x \ 9 ~ 0A5J\f(9, 1) + 0A5Af(0, 1/100) + 0AJ\f(9 - 5, 1) (1 and 1/100 
denoting the variances), and the posterior distribution associated with an observation x = is 
the mixture: 



x = ~ 

0.45A/-(0, 1) + 0.45A/-(0, + O.W(5, l))l hlo ,io] 

Using S(x) = x and p(x,z) = \z — x\, the approximation of this posterior ir( 
written as: 



x = 0) can be 



n £ (6 \x = 0) oc 



0.45 
+0.1 



*(e-0) + $(lO(e-0))-#(-e 
4>(e-6> + 5) -4>(-e-0 + 5) 



S(lO(-e-0)) 



where $ denotes the distribution function of a A/"(0, 1). Using a tolerance level e = 0.025, the 



appro xima ted posterior is indistin guishable from the exact posterior density. As in ISisson et al. 
2007| and iBeaumont et al.l [2009( | . we used a M{9 { ^- 1 \ 0.15 2 ) as the proposal q(. \ 9^). The 



algorithm was run on 20. 10 6 iterations with a burn-in of 5.10 6 , and it appears that for some runs 
the chain did not succeed to visit the small mode around 5, while for others it stayed stuck in 
this local mode and does not visit well the global mode around 0. Figure Q] shows the results 
of two runs. It is clear that these results are not satisfactory. As a comparison, the population 
obtained by a standard ABC is given in Figure [2] It is quite good, hence the issue observed in the 
distribution tail is specific to MCMC, it is not due to the ABC approximation of the likelihood. 



1.2 Contribution of the paper 

Until now ABC methods using MCMC schemes were not c ompletely sat i sfacto r y. Hence sequen 



tial a lgorithms have been proposed by different authors: 
2009t | (simila r algorithms have been proposed by ISisson et al 



Sisson et al.1 (2007 1 



2009l | and lToni et al 



Beaumont et al. 
2QQ gj), and 



^recently EStoL^] S (a similar algonthm has be^oooseTl ^D^va.Hi and Pettittl 
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Figure 1: ABC-MCMC for the modified toy example: samples obtained by two runs (20. 10 6 
iterations with 5.10 6 iterations of burn-in) for tolerance levels of 0.025. The orange dotted curves 
stand for the exact posterior distribution, the green dotted curves for the target approximation, 
and densities of obtained samples are represented in gray. 




Figure 2: Modified Toy Example: population associated with a tolerance level of 0.025 (5000 
samples) obtained with a standard ABC. The orange dotted curve stands for the exact posterior 
distributions, the green dotted curve for the target approximation, and the weighted population 
are represented in gray. In this example the acceptance rate is of 3%. 



201l|| ) . The ABC-PMC of iBeaumont et al.1 [20091 ] is simple to use and to implement, and it 
gives non-biased r esults from the approx imate posterior ir e (0 \ x). A more recent algorithm is 



the ABC-SMC of bel Moral et al.l [2009]: both tolerance levels and transition kernels are au- 



tomatically calibrated at each iteration, and several datasets are generated given a proposed 
parameter. These sequential ABC schemes are now very popular. They compare favorably to 
MCMC schemes, but they have not been compared to MCMC schemes using a population of 
chains. 

In this paper we are interested in developing such a method, by doing an analogy with the Paral- 
lel Te mpering algorithm used in classical MCMC methods jGevei . 1991 . Gever and Thompson . 
1995]. In particular, we want to know if using a population-based approach the drawback of 
ABC-MCMC schemes concerning distribution tails can be avoided. Moreover, we would like to 
compare the behavior of this approach with ABC-PMC and ABC-SMC algorithms, especially in 
distribution tails. 



The paper is organized as follows. In Section 2 we introduce the new population-based ABC- 
MCMC scheme, that we call ABC-PT algorithm. In Section 3 this algorithm is illustrated on the 
modified toy example and on real data, and comparisons with the ABC-MCMC, ABC-PMC and 
ABC-SMC algorithms are done. Finally, the method and the results are discussed in Section 4. 



2 The ABC-PT algorithm 

2.1 Standard Parallel Tempering (PT) 

When the goal is to sample from a multi-modal target density it, c l assica l MCMC methods 
like Metropolis-Hast ings algorithm [Metropolis et alffl Ern^ . S 

or Gibbs sampler 

Gelfand and Smithl . Il990l | for instance, are often trapped into local modes from where they 



cannot escape in reasonable time. To avoid this problem, the principle of PT is to introduce 
N temperatures and to run in parallel N associated MCMC chains with target distributions 
being tempered distributions of the target n. The first chain targets w, and since the tempered 
distributions become flatter as the temperature increases, the chains at high temperatures can 
move easily between modes and explore the whole parameter space. Each iteration of the PT 
algorithm is decomposed into two types of moves: local moves via classical MCMC algorithms 
to update the different chains, and global moves allowing swaps between two chains. The use of 
these swaps enables new modes to be propagated through the different chains, thereby improving 
mixing. The first chain associated with the target distribution will then be able to escape from 
local modes. 



2.2 Analogy with the PT in an ABC framework 

The analogy with the PT is the following: a population of N chains is used, chains of higher 
orders should be able to move easily in the parameter space, and chains of lower orders should 
give precise approximations of the target posterior. These last chains can have some difficulties 
to exit local modes, so exchange moves between chains should be defined so that the chains of 
lower orders can visit all the parameter space, and hence can exit low probability areas. 
To achieve this in an ABC framework, chains of higher orders are associated with high tolerance 
levels, while chains of lower orders are associated with low tolerance levels. A sequence of 
tolerance levels is then defined as e = {e\, £2, • • • , £/v), with s\ < £2 < ■■■ < £/v- The ith 
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chain is associated with £j. In particular the first chain is associated with E\ the tolerance level 
required for the approximation of the posterior of interest, and the chain associated with the 
higher tolerance level e^r should be able to exit areas of low posterior probabilities. 

The set of parameters associated to the N chains is written 6 = (61, 02, ■ ■ ■ ,8n) with 0j £ 
C H rf , and z = (zi, Z2, ■ ■ ■ , zjv) denotes the associated simulated datasets. The state associated 
to the ith chain is (zt, 6i), for % £ {1, . . . , N}. 



2.2.1 Local moves 

Each chain is locally updated using an iteration of an ABC-MCMC algorithm, with associated 
tolerance level and transition kernel. In order to improve the exploration of the parameter space 
by higher order chains, the transition kernels associated with these chains should enable large 
moves. As the tolerance levels associated with these chains are high, the probabilities to accept 
the proposed moves can remain reasonably high. On the opposite, transition kernels associated 
with lower order chains should propose smaller moves, to have more chance to accept the proposed 
moves. We then propose to use tempered transition kernels, using a sequence of temperatures 



T*i = 1 < T2 < . . . < Tjy (such tempered transitions kernels have been used by iRatmann et al. 



2007]). In particular the ith chain is associated with £j, Tj and a symmetric transition kernel 
qi(. I .). For instance | .) can be a Gaussian transition kernel tempered by Tj. This ith chain 
is associated with the following joint distribution: 

ir £t (zi,9i I x) oc n(9i)f(Zi \ 9i)l{ p (s(z i ),S(x))<e i }(Zi)- 

We also introduce: 

N 



vr*(z,0 J x) = Y[7r ei (Zi,ei I x). 



i=l 



2.2.2 Exchange moves 

To perform an exchange move between two chains, a pair of chains is chosen uniformly. The 
orders of these chains are denoted by i and j {i < j), and the states proposed by the exchange 
move are written 6' and z' . We denote by q(z',6' \ z,9) the probability of proposing (z',0') 
from (z,6). We propose to exchange the state (zi,9i) of chain i with the state (zj,6j) of chain 
j. We then have: 

6' = . . . , 0j, . . . , 0i, . . . , 0jv), 

Z = (Zl, — , Zj, — , Zi, — , Zn"). 

The acceptance probability to ensure reversibility of the move will be the following (using the 
classical detailed balance condition): 

7r* £ (z',e'\x)q(z,e\z',e') 

a = 1 A 



tt*(z,6 I x)q(z',6' \z,0) ' 



, -Ktiz'e' x) 

1 A (21 

TT*( Z ,e X) 
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Indeed, as the pair of chains is chosen uniformly among all the possible pairs, it is clear that 
q(z, 6 | z', 6') = q(z', 6' \ z, 6). In addition we have: 



TT*(Z,6 i 



TT £i (zi,6i | x)ir Ej (Zj,6j | x)' 

f( z 'i I 9j)t{ p (S(z' i ),S(x))<e l }( z i) 
f(Zi I 9i)^{p(S(z t ),S(x))<s t }(Zi) 

f(z'j I ^)l{p(S(^.),SW)<^}(4) 
/(^' I ^•)l{p(5(* i ),5( ! r))<s i }(2j)' 
1 {p(S(2: J -),S(3 ; ))<e < }(^) :IL {/)(5(2: < ),S( a; ))<e J -}(^) 

l{p(S( Zi ),5( :c ))< £< }(^) 1 {p(5fe),5( :t; ))< £:/ }(^) 



(3) 



By construction e, < e,-, and 
1 {p(S(^),5(x))< £l }( 2;i ) :11 {p(S( Zj ),S( :C ))< ej }(^) = I- 

Therefore ([3]) simply reduces to ^-{ p (s(z),S(x))<6i}( z j) > an d the exchange moves is accepted if 
{p(5(2j), S(x)) < Ei}. This exchange move satisfies ABC requirements as there is no need to 
compute any likelihood. We can note that this acceptance rate depends only on the tolerance 
level £j. 

2.3 ABC-PT algorithm 

Define a sequence of tolerance levels e\ < £2 < • • • < En, and a sequence of temperatures 
T 1 = KT 2 < ... <T N . 

ABC-PT 

To obtain a Markov chain of length n: 

1. t = l. Use the standard ABC algorithm to obtain A realizations: 
For i in {1, . . . , A}, 

generate (9^, 2} ) from tt £ .(z,9\x). 

Set e^ = (9?\9?\...,9^) and *(<>) = (,f \ ,f , . . . , $)) . 

2. For t in {2, . . . , n} : 

(a) LOCAL MOVES ABC-MCMC: For % in {1,...,N}: 

i. Generate 6[ from | 0^* ^) depending on Tj. 
ii. Generate z\ given 9[, from /(. | 9^). 

Set (6'f ) ,4* ) ) = (Oi> z 'i) with probability 



7 



Else set ((f, zf) = #~») . 



(b) EXCHANGE MOVES: 

N exchange moves are proposed: 

iV pairs of chains are chosen uniformly in all possible pairs with replacement. 
For each of them: Denote by i and j (i < j) the orders of a chosen pair 
of chains . 

Exchange (zi,9i) with (zj,6j) 

if <£;. 



At each iteration ./V exchange moves are proposed. Indeed, it appears from our experience 
that only one proposition is not sufficient to enable a good mixing of the first chain of interest. 
Moreover increasing the number of exchange moves improves the mixing with the first chain of 
interest and allows to obtain less correlated samples. 

2.4 Improving acceptance rates in ABC-PT 

In order to increase the acceptance rate of the exchange moves in the ABC-PT algorithm, we 
consider a if-partition of the tolerance levels space, for a fixed integer K. In this way we obtain 
K disjoint subsets E\,--- ,Ek, that we shall call rings. A chain of order i is associated to a 
ring Ej if p(S(zi), S(x)) € Ej. In the exchange move part of the ABC-PT algorithm, two chains 
are then chosen such that they are associated to the same ring. Thus, the second part of the 
algorithm is modified as follows: 

(b) EXCHANGE MOVES WITH RINGS: 
N exchange moves are proposed: 

For each exchange move a ring with at least two associated chains is chosen randomly 
and two associated chains are chosen uniformly in this ring. 

For each of them: Denote by % and j (i < j) the orders of a chosen pair of chains. 
Exchange (zi,9i) with (zj,9j) if p(S(zj), S(x)) < £» . 

In practice, taking into account these rings, if an exchange between two selected chains i 
and j is proposed, then the acceptance probability ^-{ p (s{zj),S{x))<Ei}{ z j) ls generally greater than 
in the case where the two chains are chosen uniformly (without rings). Note that this idea of 
partitioning can be related to the construction of rings of energy of the states space proposed in 



Kou et all (20061 ] 



3 Illustrations on modified toy example 
3.1 ABC-PT 

We use the modified toy example presented in the Introduction to study the behavior of the ABC- 
PT algorithm, especially in distribution tails and in local posterior modes. We take N = 15 
chains, and imitating the ABC-MCMC algorithm, the tempered transition kernels used are 
qi (.\df~ l) ) ~ AT(0f" 1) ,O.15 2 r 4 ), i = 1,...,N. Concerning the sequence of temperatures, we 
choose T\ = 1 and T/v = 4. The other temperatures can be chosen by evenly spacing them 
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Figure 3: ABC-PT for the modified toy example: approximations obtained 

by each of the N chains, the associated sequence of tolerance levels is e = 

(2, 1.4625, 1.0694, 0.7820, 0.5719, 0.4182, 0.3058, 0.2236, 0.1635, 0.1196, 0.0874, 0.0639, 0.0468, 0.0342, 0.025), 

for 600 000 iterations with a burn-in of 150 000. The orange dotted curves stand for the exact 

posterior distribution, the green dotted curves for the target approximation, and densities of 

obtained samples are represented in gray. 



e 


2 


0.78 


0.31 


0.12 


0.047 


0.025 


Local acceptance rates 
Accepted exchange moves 


65.8% 
106 711 


45.6% 
171 678 


26.0% 
191 311 


12.5% 
187 557 


5.4% 
156 676 


3.0% 
107 330 



Table 1: Modified toy example: acceptance rates for local moves and number of accepted ex- 
change moves, for chains 15, 12, 9, 6, 3 and 1 of the ABC-PT. 



on a logarithmic scale, by eve nly spacin g thei r inverses, or by evenly spacing their iny erses 
geometrically [see for instance Kou et al. . 20061 . Nagata and Watanabe . 20081 . Neal . 19961 ] . In 
this example we decide to space them on a logarithmic scale. Concerning the tolerance levels, 
once E\ and £/v are chosen, it gives satisfactory results to c hoose the others by usi ng a sequence of 
proportionality constants [approach similar to the one of Beaumont et al. . 20091 ] or by regularly 
spacing them on a logarithmic scale (which is a special case of a sequence of proportionality 
constants). In this example they are chosen evenly spaced on a logarithmic scale between E\ = 
0.025 and ejsf = 2. We checked that the highest order chain associated with T/v = 4 and en = 2 
easily explores the parameter space. The algorithm is run for 600 000 iterations with a burn-in 
period of 150 000 iterations. 

Figures [3] and U] show the obtained approximations 7T ei (.[aj) for the N chains in parallel. They 
are satisfactory, the two modes being well visited. Table [T] presents the acceptance rates and 
the numbers of accepted exchange moves for six of the 15 chains: on average, 2.05 moves are 
accepted per iteration (out of 15). 
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Figure 4: ABC-PT for the modified toy example: approximations obtained by the chain asso- 
ciated with tolerance level of 0.025 (600 000 iterations with a burn-in of 150 000). The orange 
dotted curves stand for the exact posterior distribution, the green dotted curves for the target 
approximation, and densities of obtained samples are represented in gray. 



Compared with the approximations obtained with the ABC-MCMC algorithm, the results 
obtained with ABC-PT are greatly improved in terms of exploration of the parameter space. 
Indeed, the first chain of interest succeeds to visit the small mode around 5, and it is not trapped 
in this local mode. Moreover, the results of different runs appeared very similar indicating that 
the results of ABC-PT are reproducible. Table [2] shows that there is also an improvement in 
terms of autocorrelations. In Figure [5] the traces of the two algorithms are given, and it is 
observed that the mixing of the chain is improved by ABC-PT compared to ABC-MCMC. 





thinning 


Nb of samples 


Order 1 


Order 10 


Order 20 




none 


15.10 6 


1 


1 


1 


ABC-MCMC 


100 


15. 10 4 


0.999 


0.994 


0.990 




1000 


15. 10 3 


0.994 


0.968 


0.951 




none 


45. 10 4 


0.842 


0.367 


0.284 


ABC-PT (chain 1) 


10 


45. 10 3 


0.361 


0.198 


0.142 




50 


9.10 3 


0.215 


0.093 


0.082 



Table 2: Modified toy example: autocorrelations of order 1, 10 and 20 of the chain obtained by 
ABC-MCMC, and of the first chain obtained by ABC-PT (with or without thinning the chains). 
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CCs-pOO 5.0^+06 1 .0-s+07 1 _5-e+07 Oe+00 1e-H35 2e+05 3e+05 4a-ffl5 

iterations Iterations 



Figure 5: Modified toy example: On the left the trace of a chain obtained by ABC-MCMC 
(example corresponding to the right part of Figure [TJ , on the right the trace of a first chain 
obtained by ABC-PT. 



3.2 Comparison with ABC-PMC 

The ABC-PMC of lBeaumont et all |200gj is run on the modified toy exa mple. The decreasing se- 



quence of tolerance levels used is the same than in lBeaumont et al.l [2009], that is (2, 1.5, 1, 0.5, 0.1, 0.025) 



Six weighted populations of size 5000 are generated, and represented in Figure [6j The results 
are much better than those obtained by the ABC-MCMC. In particular, the sampler is not stuck 
in distribution tails. However it seems that as the tolerance level decreases, the tails are under- 
covered and in particular the small mode around 5, see the left of Figure [3 That is also observed 
for the standard toy example (without the small mode around 5), see the right of Figure [71 

Remark 3.1 The ABC-PMC algorithm has been run with the same sequence of tolerance levels 
than those used for the ABC-PT: 15 populations were generated, hence it took more time than a 
run in which only a sequence of six tolerance levels is used. It appeared that the 15th weighted 
population associated with £\ = 0.025 did not cover better the local mode around 5. Moreover 
the diversity of this population was decreased, as its Effective Sample Size was of 3858.295, 
compared to 4164-55 for the 6th population when six populations were generated. Note that like 
the ABC-PT, the results of the ABC-PMC were reproducible. 

3.3 Comparison with ABC-SMC 



We a lso compare the results of the ABC-PT with those obtained with the ABC-SMC Del Moral et all 



2009]. We take e = 0.025 and a = 0.95. To run ABC-SMC and ABC-PMC with similar CPU 



times, we first take M = 1 and 15 000 particles, see the left of Figure [8j The Effective Sample 
Size is of 11242. We also run the ABC-SMC for M = 10 and 20 000 particles, see the right 
of The Effective Sample Size is of 14666.98. Generally, it appears that the results of the 
ABC-SMC are reproducible and similar to those obtained by the ABC-PT. However, we can 
note two differences: using the ABC-SMC the global mode around is slightly under-covered, 
and the ABC-SMC has the advantage to give non-correlated samples. 
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Figure 6: ABC-PMC for the modified toy example: the six populations are associated with 
tolerance levels of 2, 1.5, 1, 0.5, 0.1 and 0.025 (5000 samples). The orange dotted curves stand 
for the exact posterior distribution, the green dotted curves for the target approximation, and 
the weighted populations are represented in gray. 




Figure 7: ABC-PMC zoom: populations associated with a tolerance level of 0.025 (5000 samples), 
for the modified toy example on the left, and for the standard toy example on the right. The 
orange dotted curves stand for the exact posterior distributions, the green dotted curves for the 
target approximation, and the weighted populations are represented in gray. 
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Figure 8: ABC-SMC for the modified toy example, for a tolerance level of 0.025: on the left the 
results for M = 1 and 15 000 particles, on the right the results for M = 10 and 20 000 particles. 
The orange dotted curves stand for the exact posterior distribution, the green dotted curves for 
the target approximation, and densities of obtained samples are represented in gray. 



Remark 3.2 In an ABC framework, all sequential methods require simulating a number of 
pseudo-observations proportional to the number of particles. But their computational costs dif- 
fer concerning the calculation of some importance weigh ts: this cost is quadratic in t he nu mber 



of particles for th e algorithms o nBeaumont et al. fmM] (ABC-P MC). \Sisson et all 



Toni et al. 200x1 , whil e it is linear for the adaptive ABC-SMC of Del Moral et al. 



'2001] and 



'2009,} [see 



also lBeskos et all ItZOllj . Comparison with the ABC-MCMC or the ABC-PT is not easy. Indeed, 
MCMC methods give unweighted correlated samples, hence the use of the effective sample size 
is not relevant. To compare the sequential and MCMC approaches, comprehensive convergence 
studies of the MCMC methods in an ABC framewo rk should be achieved , taking into account the 
number N of chains, the temperature spacings [see Atchade et al. . 2010\] , as well as the autocor- 
relations. 



3.4 Comparison with N independent parallel chains 

Finally we compare the results of the ABC-PT with those obtained with N independent parallel 
chains. The 15 independent chains are locally updated by an ABC-MCMC algorithm during 600 
000 iterations with a burn-in period of 150 000 iterations, with e = 0.025 and J\f(9^ t ~ 1 \ 0.15 2 ) 
as the proposal q{. \ O^ 1 ^). Figure [9] shows on three runs that the results obtained for the N 
independent chains are not reproducible: the run represented on the left enables to detect the 
small mode around 5 (which is a little bit over-covered), the run on the middle detect the small 
mode which is quite under-covered, while the run on the right did not detect it at all (none of the 
15 chains succeeded to visit this mode). The traces of each of the 15 chains are similar to those 
of the ABC-MCMC algorithm, see the left of Figure [5j On the opposite, it appears that thanks 
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Figure 9: Modified toy example: three runs of 15 independent parallel chains with tolerance 
levels of 0.025 (600 000 iterations with a burn-in of 150 000 for each chain, hence we obtain 6 
750 000 post-burn-in samples). 





thinning 


Nb of samples 


Order 1 


Order 10 


Order 20 




none 


45. 10 4 


0.622 


0.264 


0.238 


ABC-PT (chain 1) 


10 


45. 10 3 


0.268 


0.148 


0.111 




50 


9.10 3 


0.193 


0.070 


0.048 



Table 3: Modified toy example: autocorrelations of order 1, 10 and 20 of the first chain obtained 
by ABC-PT with 3 rings of tolerance levels (with or without thinning the chains). 



to the exchange moves, the ABC-PT always succeeds to visit the small mode, and to escape it. 
3.5 ABC-PT using rings 

We illustrate the approach with rings on the modified toy example. As previously we use N = 15, 
T/v = 4, ei = 0.025 and e^v = 2. We choose K = 3 rings containing the same number of tolerance 
levels. We get E x = [0, 0.103], E 2 =]0.103, 0.495], and E 3 =]0.495,2]. The algorithm is run for 
600 000 iterations with a burn-in period of 150 000 iterations. The results are given in Figure [TU1 
Compared to the ABC-PT without rings, we observe that for an average of about +13% of CPU 
times, an increase of about +83% of accepted exchange moves is obtained. All the N(N — l)/2 
exchange rates between chains increase with the use of rings, and especially the rates between 
chains with high temperatures. Matrices of accepted exchange rates obtained on two runs of 
ABC-PT (with or without rings) are given in Appendix (Tables [7] and [8]) . An improvement in 
terms of autocorrelation is also observed as shown in Table |3j 
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Figure 10: Approximations obtained by the first chain after an ABC-PT algorithm with N = 15 
chains, with 3 rings of tolerance levels (600 000 iterations with a burn-in of 150 000). The orange 
dotted curves stand for the exact posterior distribution, the green dotted curves for the target 
approximation, and densities of obtained samples are represented in gray. 



4 Illustration on tuberculosis data 



We co nsid er the real ex ample of tuberculosis transmission studied bv lTanaka et al.l 2006j | . ISisson et al 
2007l | and|Bhun| (201(1 . The data c ome f rom a study of a tuberculosis epidemic in San Francisco 
during 1991 and 1992 [Small et all Il994j |. They consist of DNA fingerprint at the IS6110 marker 



for 473 isolates, which are grouped in 326 distinct genotypes represented as follows: 



30 1 23 1 15 1 10 



1 ol 



5 2 4 4 3 13 2 20 
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where n k indicates that k clusters of size n were observed. 

A model of disease transmission and marker mutation is defined, which is an extension of a 
birth and death process: a birth corresponds to a new infection and a death corresponds to a 
host dead or healed. A system of stochastic differential equations is used, and is governed by 
three rates: a the birth rate per case per year, 5 the death rate per case per year and 9 the 
mutation rate per case per year (see Appendix [B]). However interest parameters for biologists are 
the transmission rate a — 5, the doubling time log(2)/(a — S) and the reproductive value a/5, 
results will then be given for these parameters in the following. 

The likelihood of this model can not b e written expl i citly. However it is easy (although long) 



to simulate datasets from this model [see Tanaka et al. , 20061 . for details]. This example is then 



particularly suited for the use of ABC methods. We use the same simulation process than in 



Tanaka et al.1 |2006l | 



To compare simulated and observed data, two summary statistics considered as relevant by 
the biologists are used: the number of distinct genotypes in the studied sample g, and the gene 
diversity H defined by H = 1 — ^f=i(^') 2 > where n i is the srze of cluster with genotype i and 
n is the size of the sample. The values g b s and H i, s denote the two statistics obtained on the 
observed data. A simulated sample of size n associated to g and H is considered close to the 
observed data if y t \g bs — g\ + \H bs — H\ < e, where e is a suitable tolerance level. 

Prior distributions for the parameters a and 5 are uniforms on the interval [0, 5] with 
a > 5. Concerning the mutation rate 9, many studies have been conducted to estimate it 
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chain 1 


chain 2 


chain 3 


chain 4 


chain 5 


chain 6 


chain 7 


chain 1 


3.8 


22.6 


17.0 


13.2 


10.7 


7.6 


5.6 


chain 2 




13.9 


76.1 


58.8 


44.5 


34.0 


25.2 


chain 3 






16.2 


77.0 


58.9 


43.9 


33.6 


chain 4 








17.9 


77.3 


56.3 


45.0 


chain 5 










18.9 


74.3 


59.9 


chain 6 












19.9 


78.3 


chain 7 














20.4 



Table 4: ABC-PT for the tuberculosis example: the diagonal gives local acceptance rates, and 
outside the diagonal are given the proportions of accepted exchange moves. 



[see Tanaka et al. . 20061 . for references], which enable us to take the following informative prior: 
9 ~ JV(0.198, 0.06735 2 ) truncated on the left at 0. 

Concerning the ABC-PT algorithm, the vector of parameters is denoted by </> = (a, 5, 9), 
and (/ft is the vector of the chain i at time t, for i = 1, N. As in iTanaka et ail |2006| . the 
transition kernel for the first chain q\{.\(j>^ ) corresponds to a N((j){ , £), with: 




For the other chains, j (pf ^) corresponds to a Af(<fif with Sj = We took 

N = 7 chains, associated with temperatures between T\ = 1 and Tjy = 2 evenly spaced on a 
logarithmic scale. We used a Quad-Core Xeon E5320 1.86GHz processor with 8GB of RAM and 
we chose a minimum tolerance level of 0.01 to have reasonable acce ptance rates for gen erated 



dt-1) 



chains [to be compared with the tolerance level of 0.0025 taken by ITanaka et al. 



sequence of tolerance levels was evenly spaced on a logarithmic scale as follows: e\ 
0.03, ••• ,e 7 = 0.1. 



I200fi| . The 

= 0.01,e 2 = 



Results 



The algorithm was run for 20 000 iterations with a burn-in period of 2 000 iterations, it took 
about 4.5 days. The first chain was locally updated in 3.8 % of the iterations, and accepted 
on average 0.25 exchange move per iteration (5095 total moves). Concerning the set of N = 7 
chains, on average 3.07 exchange moves were accepted per iteration (among 7). Table 3] gives 
the local acceptance rates for each chain, and proportions of accepted exchange moves for each 
pair of chain (among the proposed exchange moves). 

Table [5] gives the posterior estimates of the interest parame ters, obtained from the first chain 



These esti mates are of the same orders than those obtained by ITanaka et al.l [2006l | . iSisson et al. 



20071 ] and lBluml [20101 ] . The mean transmission rate is around 0.60, describing the rate of increase 



of the number of cases in the population. The doubling time gives the same information, as it is 
just a transformation of the transmission rate. This parameter is used by biologists, as it can be 
easily interpreted (period of time required to double the percentages of cases in the population) . 
Here it is estimated around 1.35 years. The credibility interval of the reproductive value (number 
of new infections generated by one case) is quite large, as we have positive posterior probabilities 
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ABC-PT results 


Tanaka et al. [20061 results 


Blum [2010] results 




Mean 


Median 


95% CI 


Mean 


Median 


95% CI 


Posterior Mode 


95% CI 


Transmission rate 


0.59 


0.58 


(0.29,0.92) 


0.69 


0.68 


(0.38-1.08) 


0.56 


(0.16-0.95) 


Doubling time 


1.35 


1.20 


(0.76,2.41) 


1.08 


1.02 


(0.64-1.82) 


1.16 


(0.73-4.35) 


Reproductive rate 


5.98 


2.29 


(1.20,17.35) 


19.04 


3.43 


(1.39-79.71) 


4.00 


(2.24-117.45) 


Mutation rate 


0.25 


0.25 


(0.15,0.35) 













Table 5: ABC-PT for the tuberculosis example: posterior estimates of the parameters of interest, 
on the 18 000 post-burn-in iterations of the first chain: mean, median and 95% credibility interval. 



Transmission rate Doubling time Reproductive rate 




Figure 11: Posterior densities of the transmission rate, the doubling time, and the reproductive 
rate, obtained by the algorithms ABC-PT, ABC-MCMC, ABC-MCMC and standard ABC. 



for few very small values of 5, corresponding to high values of this rate. The mean is then 
unstable. However, most of the posterior values are around the median (see Figure [TTj). which 
should then be preferred compared to the mean. 

We then compared the results of the ABC-PT algorithm with those of the standard ABC, 
ABC-MCMC and ABC-PMC algorithms. The same minimum tolerance level of 0.01 is used 
for all the algorithms. Concerning standard ABC, 1000 samples were generated; concerning 
ABC-MCMC 350 000 iterations were performed with a burn-in period of 50 000 iterations; 
and concerning ABC-PMC 10 po pulations of 1000 s amples were generated using the following 
sequence of tolerance levels [from ISisson et all 120071 ] : E\ = l,£io = 0.01 and = 0.5(efc_i + 
£io), k = 2, . . . , 9. Note that for th e ABC-PMC algorithm we used weighted covariance matrices 
as suggested in lFilippi et al.l [20111 ] to get optimal kernels. 

Posterior estimates of parameters obtained by these three algorithms were close to those of 
the ABC-PT, see Figure [lTJ 



Standard ABC algorithm: an average number of 408 data generation steps were needed to 
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obtain one sample. This algorithm took approximately 6.5 days. 

• ABC-PMC algorithm: 203 data generation steps were necessary on average to obtain one 
particle and computing time was about 4.5 days. 

• ABC-MCMC algorithm: 9 days were required. The chain was updated in 3.8 % of iter- 
ations. The obtained simulations are more correlated than those of the first chain of the 
ABC-PT algorithm. See Table which gives for the a parameter the autocorrelations 
of the chain obtained by ABC-MCMC, and of the first chain obtained by ABC-PT, with 
or without thinning. Similar results were obtained for the parameters 5 et 6. Hence an 
improvement of the ABC-PT over the ABC-MCMC in terms of autocorrelations was ob- 
served. If one is interested in having samples not too correlated, it appears from Table [6] 
that in order to have autocorrelations decreasing rapidly and non significant from the fifth 
order, every 500th sample must be kept for the ABC-MCMC algorithm, and every 15th 
sample must be kept for ABC-PT. As a consequence, 583 (350 000/600) and 117 (20 000 
x 7 / 1200) data generation steps are required on average to obtain one realization of the 
target posterior from the ABC-MCMC and the ABC-PT respectively. Figure [12] gives the 
autocorrelation plot for the ABC-PT when all the samples are kept, and when a thinning 
of 15 is used. 





thinning 


Nb of samples 


Order 1 


Order 10 


Order 20 




none 


300 000 


0.991 


0.919 


0.850 


ABC-MCMC 


100 


3 000 


0.513 


0.107 


0.032 




500 


600 


0.162 


0.056 


0.017 




none 


18 000 


0.812 


0.456 


0.296 


ABC-PT (chain 1) 


10 


1 800 


0.473 


0.007 


-0.040 




15 


1 200 


0.402 


0.005 


0.029 



Table 6: For the a parameter: autocorrelations of order 1, 10 and 20 of the chain obtained by 
ABC-MCMC, and of the first chain obtained by ABC-PT (with or without thinning the chains). 

Note that the obtention of correlated samples is part of any MCMC method, and satisfactory 
estimates can nevertheless be achieved. 

5 Discussion 

The ABC-PT algorithm developed in this paper is a new likelihood-free method based on the 
Markov Chain Monte Carlo theory. On the studied examples, results obtained by the algorithm 
are better than those obtained by the classical ABC-MCMC algorithm, in terms of exploration 
of the parameter space, and of autocorrelations of the generated chain. The performances of this 
new algorithm appear equivalent to those of the ABC-PMC and of the ABC-SMC, which are 
sequential methods. These sequential methods have the advantage to generate non-correlated 
samples. However, on the modified toy example the ABC-PMC appeared to slightly under- 
cover posterior distribution tails and the local mode, while the ABC-SMC appeared to slightly 
under-cover the global mode. The choice between ABC-PT, ABC-PMC and ABC-SMC should 
then depend on the simulation's objective. We would recommend to use the ABC-PT or the 
ABC-SMC if we suspect that the posterior of interest is multi-modal, with small modes in low 
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ABC-PT, All samples 



ABC-PT, thinning of 15 



I 1 I I 1 I 1 



23 
Lag 



10 15 20 25 30 
Lag 



Figure 12: For the a parameter, autocorrelation plots of the first chain obtained by ABC-PT: 
on the left all samples are kept, on the right every 15th sample is kept. 



probability areas. Concerning the computational cost of the ABC-PT, a convergence study 
should be achieved, taking into account the number N of chains, the temperature spacings [see 



Atchade et al. . 20ld |. as well as the autocorrelations. 



From a theoretical point of view the N chains of the ABC-PT all together can be considered as 
a Markov chain, and the ABC-PT algorithm generates a Markov chain for (z, 6) with stationary 
distribution ir*(z,0 \ x). The chain obtained for E\ is the chain of interest, because it provides 
samples corresponding to tt Ei (zi,9i \ x) which is the target distribution. The marginal in 6\ 
7r £l (#i | x) can be considered as a good approximation of the posterior of interest 7r(. | x) if E\ is 
small enough and S well chosen according to our data. 

From a practical point of view, the ABC-PT requires calibration of the sequence of tolerance 
levels, and of the sequence of temperatures. Practical guidelines have been given in the modified 
toy example, see section [3~T1 As seen in section [331 it is possible to improve the acceptance rate 
of the exchange moves by considering a partition of the tolerance levels space to choose the pairs 
of chains to exchange. The total number of accepted exchange moves was increased by about 
83%, with only 13% of additional CPU time. 

In this paper the ABC-PT algorithm only uses one type of global move, which is the ex- 
change move. However, it is always possible to define othe r types of global moves, like those 
of the Evolutionary Monte Carlo of iLiang and W ong 2001 1 for instanc e. It is also pos s ible t o 
define delayed rejection moves, see for instance iGreen and Miral [200 1| and iJasra et al.l 2007 1. 
It would be of particular interest to use the simulations generated by the other chain s than 
th e first chain, by dey e lopin g a method in the same spirit than those of Beaumont et al. 2002| 
or iBlum and Francois! 2010l |. Moreover, as proposed by a reviewer, we can take advantage of 



using a schedule of varying s\ for the first chain of interest, to improve its mixing. Finally, we 
think that using a pre-determined sequence of tolerance levels is not the best way to obtain a 
pre-determined number of sampl es with a tolerance level as small as possible, for a given com- 



putational time. In the lines of iMuller et al.l [2004| it would then be interesting to use a non 
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pre-determined sequence of tolerance levels, and hence to have an adaptive algorithm. 
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A Matrices of exchange rates for ABC-PT with and without rings 

Tables [7] and [8] compare accepted exchange rates between chains based on the modified toy 
example, using ABC-PT algorithms with or without rings. The rate between chains % and j 
is equal to the number of accepted exchange moves between these chains divided by the total 
number of iterations. It is observed that the use of rings allows more accepted exchange moves, 
especially between high tempered chains. 
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10 


11 


12 


13 


14 


15 


chain 1 


0.16 


0.12 


0.09 


0.06 


0.04 


0.03 


0.02 


0.01 


0.01 


0.01 


0.01 


0.00 


0.00 


0.00 


chain 2 
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0.10 


0.07 


0.06 


0.04 


0.03 


0.02 


0.01 


0.01 


chain 8 
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chain 9 
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chain 10 
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0.19 


0.13 


0.09 


0.06 


chain 11 






















0.22 


0.15 


0.11 


0.08 


chain 12 
























0.24 


0.18 


0.14 


chain 13 


























0.43 


0.34 


chain 14 




























0.77 



Table 7: ABC-PT with 3 rings for the modified toy example: the diagonal gives local acceptance 
rates, and outside the diagonal are given the proportions of accepted exchange moves. 



B Formula for the Tuberculosis example 



We used the same notations than Tanaka et al.l |2006| . The number of cases of genotype i at 



time t is denoted by Xi(t), G(t) is the number of distinct genotypes that have existed in the 
population up to and including time t, and N(t) is the total number of cases at time t. 

G(t) 

N(t) = ^2x i (t). 

i=l 

The genotypes are labeled 1,2,3,... for convenience, although the ordering has no meaning, 
except that % = 1 represents the parental type from which others are descended (directly or 
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chain 1 


0.10 


0.08 


0.06 


0.04 


0.03 
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chain 2 
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chain 3 
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chain 7 
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0.02 


0.01 


0.02 


chain 8 
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0.08 
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0.04 


0.03 


0.02 


0.02 


chain 9 
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0.08 


0.06 


0.04 


0.03 


0.02 


chain 10 




















0.10 


0.08 


0.06 


0.04 


0.03 


chain 11 






















0.22 


0.15 


0.06 


0.04 


chain 12 
























0.11 


0.08 


0.05 


chain 13 


























0.10 


0.07 


chain 14 




























0.10 



Table 8: ABC-PT for the modified toy example: the diagonal gives local acceptance rates, and 
outside the diagonal are given the proportions of accepted exchange moves. 



indirectly). The three rates of the system are the birth rate per case per year a, the death rate 
per case per year 5, and the mutation rate per case per year 9. iTanaka et all |2006| define the 
following probabilities: 

P i:X (t) = P(Xi(t) = x), 
P n (t) = P(N(t) = n) and P g (t) = P(G(t) = g). 
The time evolution of Pi tX (t) is described by the following differential equations: 

dPi, x {t) 



dt 



-{a + 5 + 9)xP iiX (t) + a{x - l)P ii!B _i(t) 



dPj, {t) 
dt 



no event birth 

+ (<J + 0)(x + l)iVu(t), x = l,2,... (4) 

* v ' 

death or mutation 

(S + 9)P ijl (t). (5) 



Initially there is only one copy of the ancestral genotype, hence the initial conditions are: 
P i>x (0) = for all (i,x), except Pl,i(0) = 1, and for i = 2,3,4,..., P»,o(0) = 1. 
To take into account the creation of new genotypes, the probability P g (t) is described by the 
following differential equations (only a mutation can create a new genotype): 

= -0N{t)P g (t) + 6N(t)P g -i(t), 5 = 2,3,4,... (6) 
at v / v v ' 

no mutation mutation 

dPi(t) 



dt 



ON^P^t). (7) 



The initial condition is G(0) = 1. Let t g be the time when a new genotype g is created, we have 

Pg,l(tg) = P(Xg(tg) = 1) = 1, & Ud Pg^tg) = P ' (X g (t g) = X) = for X ^ 1 . 

The total number of cases N(t) is described by the following differential equations (only a birth 
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or a death influence changes in this number): 

dP n (t) 



dt 



= -(a + <S)nP n (t) + a(n-l)P n _i(t) (8) 

V «, ' V «, ' 

no birth, no death birth 

+ 6{n + l)P n+1 {t), n=l,2,... 



death 

dP (t) 



dt 

The initial conditions are Pi(0) = 1 and P„(0) = for n / 1. 



SPi(t). (9) 
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